Multi-state predictive neural networks for text-independent speaker recognition
نویسندگان
چکیده
Both Hidden Markov Models and Neural Networks have already been used as production systems for speaker identification or verification. Recently [9] has shown that ergodic multi-state hidden Markov Models do not outperform one-state "hidden" Markov Models, i.e. Gaussian Mixture Models, for speaker recognition. She put in evidence that the important characteristic of these models is the total number of mixtures and not the number of states. These HMMs are thus unable to make use of temporal information for performing speaker recognition. On the other hand, recent experiments have shown that, for neural predictive systems, modelization of non stationarity allowed to significantly improve the performances [6]. We are interested here in the development of such models which will be refereed to as multi-state predictive neural networks (MSPNNs). We study the ability of these systems for speaker identification and discuss the superiority of multi-state upon one-state models. We provide results on 15 talkers from the TIMIT database.
منابع مشابه
Speaker Recognition Using Gaussian Mixtures Models
Speech signal contains several levels of information. At first it contains information about the spoken message. At second level speech signal also gives information about the speaker identity, his emotional state and so on. The task of speaker recognition can be divided into two parts: speaker identification and speaker verification. Speaker identification is answering the question which one o...
متن کاملSpeaker recognition model using two-dimensional mel-cepstrum and predictive neural network
This paper describes a speaker recognition model using TwoDimensional Mel-Cepstrum and predictive neural network. The speaker model consists of two networks. The rst one is a self-organizing VQ map(Kohonen's feature map). The second part is the predictive network and learns transitional patterns on the feature map of each speaker's model. TDMC consists of averaged features and dynamic features ...
متن کاملText-Dependent Speaker Recognition Using Emotional Features and Neural Networks
This paper deals with a novel feature extraction method for text dependent speaker recognition. Four female speakers were used to create a text –dependent database for Malayalam (one of the south Indian languages). Discrete Wavelet Transform was used for feature extraction and artificial neural network was used for machine intelligence. In this work we used emotional features for speaker recogn...
متن کاملAcoustic-phonetic decoding based on elman predictive neural networks
In this paper we present a phoneme recognition system based on the Elman predictive neural networks. The recurrent neural networks are used to predict the observation vectors of speech frames. Recognition of phonemes is done using the prediction error as distortion measure in the Viterbi algorithm. The performance of the neural predictive networks is evaluated on both the training database and ...
متن کاملMulti-State Time Delay Neural Networks for Continuous Speech Recognition
Alex Waibel Carnegie Mellon University Pittsburgh, PA 15213 [email protected] We present the "Multi-State Time Delay Neural Network" (MS-TDNN) as an extension of the TDNN to robust word recognition. Unlike most other hybrid methods. the MS-TDNN embeds an alignment search procedure into the connectionist architecture. and allows for word level supervision. The resulting system has the ability to ma...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1995